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Abstract — This paper confirms a surprising phenomenon first 
observed by Wright et al. Q) (2) under different setting: given 
m highly corrupted measurements y — An.x* + e* , where An. 
is a submatrix whose rows are selected uniformly at random 
from rows of an orthogonal matrix A and e* is an unknown 
sparse error vector whose nonzero entries may be unbounded, we 
show that with high probability l\ -minimization can recover the 
sparse signal of interest x* exactly from only m — C/i 2 fc(logn) 2 
where k is the number of nonzero components of x* and 
fi — nmaxy A$j, even if nearly 100% of the measurements are 
corrupted. We further guarantee that stable recovery is possible 
when measurements are polluted by both gross sparse and small 
dense errors: y — Aq,x* + e* + v where v is the small dense 
noise with bounded energy. Numerous simulation results under 
various settings are also presented to verify the validity of the 
theory as well as to illustrate the promising potential of the 
proposed framework. 

Index Terms — Compressed sensing, t\ -minimization, sparse 
signal recovery, discrete Fourier transform, (weak) restricted 
isometry, random matrix, dense error correction. 



I. Introduction 

Compressed sensing (CS) has been rigorously studied over 
a past few years as a revolutionary signal sampling paradigm 
0, 0, 0. According to CS, a fc-sparse signal x* € R" is 
measured through a set of linear projections yi = (a,i,x*), 
i = 1, m, in which vectors a, € M. n form a matrix 
A of size to x n. The intriguing CS framework advocates 
the collection of significantly fewer measurements than the 
ambient dimension of the signal (to < n). To reconstruct x*, 
a standard l\ -minimization is proposed to solve the inverse 
problem 

mill 1 1 a; 1 1 1 subject to y = Ax. (1) 

X 

It has been well known in the literature that if A obeys 
Restricted Isometry Property (RIP) @, - a property essen- 
tially implies that every subset of k or fewer columns of A is 
approximately an orthogonal system, then the linear program 
in ([T]i is able to faithfully recover x*. This RIP condition has 
been proven to hold for many types of random measurement 
matrices [8|, [9|. For example, random Gaussian or Bernoulli 
matrices satisfy RIP with high probability as long as the 
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number of measurements to is on the order of fclogn 0, 
whereas the sub-orthogonal matrix An, sampled uniformly 
from an orthogonal matrix A obeys RIP with high probability 
when m is on the order of A:log 4 n 0. 

In many practical applications, we are often interested in 
situations in which measurements are contaminated by noise. 
Mathematically we often observe 

y = Ax* + e*, 

where e* £ M. m is the vector noise. To reconstruct x* from 
the observation vector y, we minimize the following convex 
program 

minHxH-, subject to ||y — Ax\\ 2 < a, (2) 

X 

where a is upper bound of the noise level ||e*|| 2 , which 
assumes to be known. It has been shown in 0, ifTOl . ifTTI . 
Ifl2ll that if A satisfies RIP and a is not too large, then by 
the same amount of measurements as above, solution x of 
Q does not depart too far from the optimal solution x*. In 
particular, the authors of J6| proved that the reconstruction 
error proportionally grows with a as ||x — x* || 2 < Ca, where 
C is a small numerical constant. 

This result is elegant when the noise level is low. However, 
as the noise energy gets larger, x might be unexpectedly very 
different from x* . This implies that even a single grossly 
corrupted measurement may produce x arbitrarily far from 
the true solution. Unfortunately gross errors and irrelevant 
measurements are now ubiquitous in modern applications such 
as image processing, sensor network, where certain number of 
measurements may be severely corrupted due to occlusions, 
sensor failures, transmission error, etc 1131 . 0, 0. These 
examples motivate us to consider a new problem in which 
we aim to recover a sparse vector x* from highly corrupted 
measurements, y = Ax* + e*. In contrast to previous ap- 
proaches 0, iflOll . IfTTI . lfT2 l where only small dense noise 
term e* is considered, in this paper, entries of e* can have 
arbitrarily large magnitude, and their support is assumed to 
be sparse but unknown. The underlying model has been 
previously developed by Wright et al. 0. Motivated from the 
face recognition problem, in which sparse error appears due 
to a fraction of the query image y being occluded by glasses, 
hats, etc, the authors proposed to simultaneously minimize the 
^i-norm of both x and e, 

min llxlU + HelL subject to y = Ax + e. (3) 

x,e 

where columns of matrix A are associated with training im- 
ages. To analyze the model, they assume A obeys the Gaussian 
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distribution Q. That is, entries of A are i.i.d A/"(0, 1/m) 
Gaussian random variables. 

As pointed out by Candes and Romberg 1 14] and Do et al. 
[15 1, in compressed sensing, completely random measurement 
matrices might not be relevant in many practical applications. 
First, we may not be allowed to control measurement matrices. 
For instance, in MRI or tomography, due to the acquisition 
system, measurements are inherently frequency based. The 
second drawback is computationally expensive and memory 
buffering due to their completely unstructured nature. These 
weaknesses prevent these fully random sensing matrices from 
being applied to applications in which both acquisition sys- 
tem (or encoder) and reconstruction system (or decoder) are 
required to have low complexity and fast implementation. 

In this paper, we extend the analysis to a special class of 
measurement matrices which are constructed from an orthog- 
onal matrix A. Let H be a subset of indices of {l,...,n}; 
and measurement matrix Aq, is constructed from rows of A 
associated with indices in ft. The observation vector y is now 
obtained by 

y = A n .x* + e*, (4) 

where we assume that signal x* and error e* are sparse vectors 
whose supports are T and S, respectively. These suborthogonal 
measurement matrices have been carefully studied in the litera- 
ture such as the partial Fourier ensemble lfl4ll and structurally 
random matrix (SRM) [15| as a promising replacement for 
fully random Gaussian/Bernulli sensing matrices. However, so 
far, none of the previous work guarantees stable reconstruction 
under highly corrupted sparse error or a combination of both 
large sparse error and small dense noise. This is our most 
significant technical contribution. 

To recover x* and e*, we propose to solve the following 
extended l\ -minimization 

min \\x\\-, + A ||e|L subject to y = A^.x + e, (5) 

where A > is a controlled parameter that balance the two 
fi-norm terms. 

Surprisingly, with an appropriate choice of A, this simple 
linear program ^ can assure the exact recovery both x* and 
e* exactly, even when the sparsity of x* grows almost linearly 
in the dimension of signal and the errors in e* are up to a 
constant fraction of all the entries. This observation will be 
confirmed via rigorously mathematical justifications as well 
as extensive simulations in the next few sections. 

A. Motivational applications 

There are many important applications in which the ob- 
servations of interest can be modeled as a linear projection 
of a sparse signal plus sparse error. Before shifting to the 
presentation of our main results, we briefly introduce several 
applications and show how well they fit into our underlying 
model of interest 

• Image inpainting. Given an image Y with miss- 
ing/corrupted pixels, we would like to reconstruct the 
original image by filling in lost information lfl6l . If 
we assume that errors are indicated by a matrix E 



whose nonzero-value entries are associated with the miss- 
ing/corrupted pixels, then Y can be decomposed into two 
components: the original image B and sparse noise E. 
In image inpainting, the key hypothesis frequently made 
to guarantee satisfactory performance is that Y has to 
be sparsely represented by a few coefficients over an 
overcomplete dictionary D [17|, [16|. This dictionary is 
typically a concatenation of orthogonal transformations, 
e.g. wavelet, Fourier, DCT or is learned from a set of 
training images. By denoting y, b and e as vectorized 
versions of matrices Y, B and E, we have a mathematical 
representation, y = Dx + e, where x is the sparse 
coefficient vector. As opposed to previous works in which 
locations of missing entries are often required to be 
known in advance, here we do not need to make any 
of such assumptions in our model. Rather, utilizing the 
optimization in (|5j, we let the algorithm guess both the 
noisy locations and their magnitudes. 

• Compressed sensing for networked data. In sensor net- 
works, the goal is to design a low-power system but still 
guarantee reliability in transmission. In this setting [18|, 
each sensor collects information of a signal or object 
x* by simply projecting x* onto row vectors a 2 ; of a 
sensing matrix A g ]R mx ", bi = (a.j,a;*). As suggested 
in ITSl . rather than realizing A in a completely random 
manner, it is simpler and less computational complex to 
utilize a matrix A that we can exploit fast implementation 
and avoid expensive memory buffering such as DCT, 
Hadarmard or the Fourier transform. 

After having gathered all the data, these sensors send 
measured information yi to their neighbors or a central 
hub for analysis and processing. However, due to the fact 
that sensors are low cost, it is highly likely that some 
sensors might fail in collecting data or producing mea- 
surements that are not well protected before transmission. 
This implies that some measurements may be severely 
corrupted by two types of errors: 

y = Ax* + e* + i>, 

where e* is the sparse error, whose entry magnitudes in 
the support can be arbitrarily large and v is dense noise 
with bounded energy a. To recover both x* and e*, we 
propose to solve 

mirx ll^lli + A llell, s.t. \\y — Ax — e|L < a. 

x,e 

• Joint source-channel coding. One potential application 
of CS is simultaneous joint source channel coding [19|, 
[20|. In contrast to conventional approach where source 
data x* E K™ is initially encoded to remove redundancy, 
then channel-coded for error protection. In CS, x* is 
encoded by a simple linear projection y — Ax. In |19|, 
to protect the channel, the authors proposed to use more 
measurements than the optimal value that CS can recover 
accurately. In order to retrieve x* under channel error, we 
need to know the probabilistic model of corrupted entries, 
which is usually unavailable in practice. We believe that 
ours is a more accessible and more robust approach in 
recovering such signal. 
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B. Notations and organization of the paper 

We briefly introduce some notations that will be used 
throughout the paper. We denote xt as a vector whose entries 
are selected from the index set T g {1, n) of x € R™. Let 
£7 be a subset of {1, ...,n}, we denote Aq, as a submatrix of 
A, whose rows are taken from fl. Similarly, A^t denotes a 
submatrix of A, containing rows indexed by 51 and columns 
indexed by T. Further, we reserve the two index sets T and S 
for signal (x*) and error (e*) supports. The sparsity level of 
x* and e* are k = \T\ and s = \S\, respectively. For a vector 
x, sgn(x) represents the sign of x componentwise. 

We will use several standard vector and matrix norms, which 
we simply present here for completeness. For x £ R™, x = 

Y^i=i \ x i\ * s ^ e ^i-norm, ||x|| 2 is the £ 2 - noim an d Halloo = 
maxi \xi\ is the loo-norm. For matrices B, we only use the 
spectral norm, denoted by ||-B||. 

We denote by C, C\, c, c±, ... positive absolute constants. 
Finally, when we say that an event occurs with high probabil- 
ity, we mean the occurring probability of the event is at least 
1 — cn . 

The remainder of this paper is structured as follows. The 
main results are introduced in Section [TTJ Our proof structure 
is described Section III Supporting results are subsequently 



presented in Sections IV and [V] Section VI compares our 
results with the oracle in which we know in advance the 
locations of signal and error support. We demonstrate the 
consistency of our results via extensive simulation in Section 



VII Finally, Section VIII summarizes this paper and makes 
some closing remarks. 

II. Main results 

A. Sparse model 

We begin by studying the easier problem where signal x* is 
perfectly k-sparse and observation vector y is also corrupted by 
sparse error. A more difficult problem with non-sparse signal 
x* and y being corrupted by both sparse and dense noise will 
be subsequently investigated in this section. Toward the end, 
we denote the sparsity indices of x* and e* as k and s and 
introduce the (fc, s)-sparse model defined as follows: 

• Signs of x* at the support T is independently and equally 
likely to be 1 or — 1. 

• Support S of e* is uniformly distributed among all sets 
of cardinality s in ft. 

The random assumption on the sign of x* at support T 
is typical in compressed sensing [14|. This assumption is a 
sufficient rather than necessary condition and is employed 
for the convenience of our proof only. Indeed, by sacrificing 
a factor of logn to the number of measurements, we can 
establish similar results when the signs of x* arbitrary. We 
refer the interested readers to a recent paper [21] for more 
details. 

B. Exact recovery as measurements are corrupted by sparse 
noise 

Theorem 1. Let x* be a fixed vector in R n and A be an 
n x n orthonormal matrix (A* A = I) with \Aij\ 2 < ^, 



where 1 < p < n, and assume that (x*,e*) is taken from 
the (k, s)-sparse model. Suppose we observe m entries from 
the projection Ax* with locations in il sampled uniformly 
at random and these entries are then corrupted by noise e*. 
Then there exist numerical constants c and C such that with 
probability at least 1 — cnT 1 , the convex program Q with 
A \J ^rnlog n corre ctly recovers both the signal and the 
error ( i. e. x — x* and e — e*), provided that 



m > C/i 2 fc (log ro) 2 and s < 7m, 



(6) 



for any 7 close to 0.9. 

In other words, Theorem [T] asserts a surprising message: a 
sparse signal x* can be faithfully recovered with probability 
converging to one from arbitrary and completely unknown 
corrupted patterns (as long as they are randomly distributed). 
We do not place any assumption on the magnitudes or signs 
of the nonzero entries of e*. In fact, its magnitude can be 
arbitrarily large. Theorem [T] is generic in the sense that it 
only requires signs of nonzero entries of x* to be uniformly 
distributed; everything else is deterministic. We believe that 
the random assumption on the sign pattern is artificial and 
can be removed. Indeed, when A is a Fourier matrix, applying 
advanced techniques in (4|, we are able to obtain Theorem [T] 
for all x* supported on T. An interesting open problem is 
whether this result also holds for other orthogonal sensing 
matrices. 

It is necessary to further clarify Theorem [T] First, higher 
probabilities of success (i.e. in the form 1 — cn _/3 with (3 > 1) 
can be obtained at the expense of increasing the number of 
observations by a factor of f3. Next, the theorem addresses that 
for a particular selection of ft, exact recovery only holds for an 
arbitrary fixed sparse signal with high probability (as long as 
signs of such signal at its support are uniformly distributed). 
In other words, there is no uniform sparse signal recovery 
guaranteed here. In fact, in order to establish perfect recovery 
for all sparse signal, we might have to require certain stronger 
properties for matrix Aq, such as RIP (6) or similar to RIP. 
As shown in [8), 0, An, obeys RIP with high probability 
only if the number of measurements exceeds Ck log n, which 
is a far inferior requirement than our optimal value. By 
relaxing RIP, we are able to significantly reduce the amount 
of measurements needed and are still able to guarantee perfect 
recovery even when the data is highly corrupted. 

Theorem [T] also implies that up to a log n factor from the 
optimal number of observations as in compressed sensing, we 
are able to precisely recover the signal in the presence of 
gross error. In the following theorem, we establish that by the 
same order of fclogn measurements as in compressed sensing, 
t\ -minimization is still able to recover precisely both spare 
signal and high-energy sparse noise. In particular, we draw an 
interesting relationship between signal sparsity, error sparsity 
and the parameter A. 

Theorem 2. Let x* be a fixed vector in K." and A be an nxn 
orthonormal matrix (A* A = I) with {A^l" 2 < ^, where 1 < 
p < n and assume that {x* , e*) is taken from the (k, s)-sparse 
model. Suppose that we observe m entries from the projection 
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Ax* with locations in Q sampled uniformly at random and 
these entries are then further corrupted by noise e*. Then 
there exist numerical constants c, C\ and C2 such that with 
probability at least 1 — cnT 1 , the convex program in (J5| with 
^ = \J -yiogn Jml' 7 ^ (0, 1) correctly recovers both the signal 
and the error ( i. e. x = x* and e = e*), provided that 

m > Ci[i 2 max{- — - fc(logn) 2 , fclogn, (logn) 2 } (7) 
(I-7) 2 

and s < C27TO. (8) 

It is easy to check that Theorem [2] imp lies Theorem [T| 
by setting 7 = 0.9, or equivalently A = Wri/QS)p/rn logn. 
Later in the paper, we focus on establishing this theorem, then 
Theorem [T] will automatically follow. 

We would like to note the significance of the parameter /i 
here: fi can be seen as the incoherence of the matrix A, which 
measure how concentrated or expanded rows of measurement 
matrix Aq, are. Since A is orthonormal, the value of ji ranges 
between 1 and n. In the worse case scenario when rows of A 
are maximally concentrated, then fi = n and A is the identity 
matrix. It is clear in this case that we cannot retrieve x* under 
a single gross error even if all n measurements (which is now 
the signal x* itself) are observed. On the other hand when 
fi = 1, entries of A are perfectly spread out and the number 
of measurements attains its optimally minimum value. 

It can be seen that A in |5) controls the balance between 
two terms: Wx^ and 1 1 e 1 1 x . Specifically, if a large value of A 
is selected, we expect to recover the denser-support signal but 
under sparser error. On the other hand, a smaller choice of A is 
better when the error is denser while the signal is sufficiently 
sparse. Theorem [2] mathematically indicates that it is actually 
the case. In particular, if 7 is chosen to be 1/logn, then 
relying on only to = Ck log n measurements, linear (convex) 
programming |5]l not only recovers the fc-sparse x* faithfully, 
it is also able to correctly identify the noise with arbitrary 
large magnitude as long as the noise sparsity is proportional to 
to/ log n. On the contrary, if we set 7 close to one, then ^ can 
retrieve x* whose support is m/Ck(logn) 2 under error whose 
support is up to a constant fraction of all the measurements. 
In fact, the theorem gives a whole range of A values, whose 
selection might rely on the prior information we can collect 
about the sparsity level of the signal as well as of the noise. 

C. Stable recovery as measurements are corrupted by both 
dense and sparse errors 

Our result in Theorem [2] although interesting, is limited 
to the case of noise being exactly sparse only. In practical 
applications, observations y are also often contaminated by 
dense noise, which can be either deterministic or stochastic. 
In this section, we investigate the model where observations 
are corrupted by both the unknown dense noise v with small 
energy bound ||^|| 2 < c and the sparse noise e*, whose 
magnitudes of nonzero entries are arbitrarily large 

y = A n .x* + e* + v. 

At first, for the ease of demonstrating our results as well as 
proving technique, we consider a particular situation where the 



observation y is only corrupted by dense error whose energy 
is bounded by a. The problem is now to recovery x* from 
noisy observation y, where 

V = A n ,x* + v. 

To recover x*, it has been well established that we need to 
minimize the following convex program 

min 1 1 2; 1 1 -, subject to \\b — Aq,x\\o < er. (9) 

When the observation vector y is clean, Candes and 
Romberg |[T4l showed that the l\ -minimization is able to 
recover x* precisely. In this section, we extend their result 
and prove that even with imperfect observations y, the convex 
program is stable vis a vis perturbations. Particularly, the 
recovery error is bounded away by a factor of a. To the 
best of our knowledge, this is the first robust recovery bound 
when measurements taken from suborthogonal matrices are 
corrupted by deterministic noise. 

Theorem 3. Under the same assumptions defined in Theorem 
[2] and provided that there exists a numerical constant C such 
that to > Cfik logn, for any perturbation v with ||^|| 2 < cr, 
the solution x to the convex program in (|9| yields 

\\x - x*\\ 2 < 8cr y/n{l + 2n/m) + 2a. (10) 

Roughly speaking, Theorem [3] states that for a family of 
matrices Aq constructed from any unitary matrix A, mini- 
mizing the ^i-norm stably recovers x from just O(ufclogn) 
measurements. A direct consequence of this theorem says that 
as a comes closer to zero, the solution of (|9]l is exact, which 
coincides with Candes and Romberg's result 1141 . Moreover, 
our result is established for any deterministic noise v. While 
preparing this manuscript, we learned of an independent 
investigation of Candes and Plan [21 1 into this problem. They 
place stochastic assumptions on v, e.g. v obeys the Gaussian 
distribution, and thus the resulting error bound is improved. 

A more challenging situation occurs when observations are 
not only contaminated by dense noise with small energy, 
but they are also corrupted by sparse noise with arbitrarily 
large magnitude. This model includes the previous settings in 
Theorems [2] and [3] as the particular cases: 

y = A n .x* + e* + v. (11) 

To successfully recover x* (as well as e*), we propose to 
minimize the following convex program 

min ||x||i +A ||e||, s. t. ||& — A$i,x — e|L < a (12) 

where a is the bound of energy noise v, assumed to be known. 

Theorem 4. Under the same assumptions defined in Theorem 
[2] and provided that there exist numerical constants C\ and 
Ci such that 

to > Ci/j 2 max{ . 7 . 9 fc(logn) 2 ,fclogn, (logn) 2 } (13) 
(1 -7) 2 

and s < C27TO, (14) 
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then the pair of solution (x,e) to the convex program (12) 
obeys 




\x-x*\\ 2 + \\e 



TheoremRlis significant because it addresses that the convex 
program in ( fl2] i can reliably reconstruct the sparse signal even 
when the measurements are severely corrupted by both gross 
sparse and small dense errors. This is the situation that we 
most likely will encounter in practical applications. When 
the measurement y is not corrupted by the dense noise v, 
the signal can be reconstructed perfectly, regardless of how 
large the sparse noise is, as previously mentioned in Theorem 
[2] In addition, we will demonstrate in Section 5 that this 
reconstruction error is optimal up to a ^fn factor as compared 
to the oracle situation in which locations of T nonzero entries 
of signal x* as well as S nonzero entries of the sparse error 
e* are known in prior. In particular, if ignoring this ^fn factor, 



( 15 1 is unimprovable. 



The preceding results have focused on scenarios where the 
signal is perfectly sparse. We now consider probably the most 
general setting, in which x* is not exactly sparse, but rather can 
be approximated well by a sparse vector and the observation 
vector y is corrupted by both sparse error and dense noise 
with noise level a. Denote x^ £ M. n as a vector containing 
the k largest magnitude entries of x* and zeros elsewhere and 
assume an uniform distribution on the sign of x\ at the support 
T, we can now establish the following corollary 

Corollary 1. Under the same assumptions defined in Theorem 
[2] the pair of solution (x, e) to the convex program ( 12 \ obeys 

8(A + 1) 



x — x 




Till 



(16) 



2a. 



Ignoring the a\/8 term, one can see how the bound in 
Corollary [T] shows a natural splitting into two terms. The first 
can be interpreted as data error associated with the noise v, 
whereas the second term relates to the approximation error, 
measuring how far the signal x* is from the best fc-sparse 
approximation x*-.. 

D. When error is sparsified under an arbitrary basis 

Thus far, we have only investigated truly sparse error 
e*. That is, e* is sparse under the identity transformation. 
A natural generation is to consider e* being sparse under 
any orthogonal transformation D, including the former as a 
particular case. Mathematically, we consider the observation 
model 

y = Aq.x* + Dg* + u, (17) 

where e* = Dg* and g* is a s-sparse vector. It is clear that 
via simple algebra, this setting boils down to (Hi as 

D*y = D*A a .x* +g* +D*v. 



Notice that due to the orthogonality of D, D*Aq, is also an 
orthogonal matrix. Therefore, all preceding theorems are still 
relevant in this setting. The parameter /1 is now interpreted as 
the mutual incoherence between the sensing matrix An and 
the sparsifying transform D. In particular, 



/i = n max | (et i7 dj) |, 

i,3 



(18) 



where and dj are columns of matrices Aq, and D. As 
the incoherence of these two matrices is small, fewer mea- 
surements are required to still guarantee stable recovery. This 
results from an intuitive fact that it is easier to decompose 
y into x* and g* if two column spaces of Aq, and D are 
sufficiently separated. 

E. Contribution and connections to previous works 

The problem of recovering the signal from grossly corrupted 
measurements has initially been formulated by Wright et al. 
in an appealing practical paper [22] and further analyzed in 
12. Taking the sparsity information of e* into account, the 
authors proposed to solve 



mm \\x\ 



subject to 



V 



Ax ■ 



(19) 



The result of is asymptotic in nature. The authors showed 
that as n is extremely large and provided x is extremely sparse, 



then ( 19 1 can precisely recover both x* and e* from almost any 
error with support fraction bounded away from 100%. Their 
analysis is based on the Gaussian assumption of the matrix A. 
Particularly, A is a matrix whose columns a/s are assumed 
to be Af([i, T^im), where ||/i|| 2 = 1 and H/i^ < CmT 1 ! 2 . 
Furthermore, for sufficiently large m, they require the sparsity 
of x to grow sublinearly with m. This is of course far from 
the optimal bound, in which k is almost linear with m (i.e. 
only in the order of m/logn). 

One of the appealing consequence of their analysis is 
an explicit expression between three important terms: the 
dimension ratio 5 = — of the matrix A € M mxn , the fraction 
error p — — and the signal support density a — —. However, 
this relationship is difficult to interpret due to the complicated 
coupling of these terms. 

Employing the idea from [2|, Li et al. ||20| and Laska et al. 
Il23ll proposed different applications under the same frame- 
work. The former considered the problem of joint source- 
channel coding, and the later proposed a so-called pursuit 
of justice model to deal with sparse unbounded noise. When 
the measurement matrix A obeys restricted isometry property 
(RIP), both of them showed that the combination matrix [A, I] 
also satisfies the RIP with high probability, where / is the 
identity matrix. A consequent conclusion is that the signal is 
perfectly recovered as long as signal and error sparsity levels 
are in the order of m/logn. The main drawback of these 
papers is that they are not able to show that perfect recovery 
is guaranteed when the number of corrupted entries is linearly 
proportional to the total number of observations. 

After the initial submission of our paper to Arxiv, we no- 
ticed another two independent investigations into this problem: 
Studer et al. ||24| and Li ||25l . The former studies the more 
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general observation model, y = Ax + De, where A and D 
are general matrices. The authors established deterministic 
guarantee, which is weaker than our results in Theorem [T] 
Using different proof techniques, the latter paper delivered 
similar results as in Theorem [T] with more general model of the 
sensing matrix An.. In particular, rows of Aq, are sampled 
independently from a population F obeying Ea^a* = /. 
However, both papers do not investigate the more realistic 
model in which both sparse and dense noise present in the 
observations. 

In another direction and much earlier, Candes and Tao 
investigated the error correction problem Q. In this problem, 
the question is how to reconstruct the input vector x* from 
corrupted measurements y — Bx* + e*, where the coding 
matrix B £ Jj mx " i s required to be overcomplete (m > n) and 
e* is the channel corruption vector, which is usually assumed 
to be sparse. They proposed to retrieve x* by solving the 
following l\ -minimization problem 

minllb- Bx\\, . (20) 

X 

Though sharing the same general l\ model, our approach 
departs from all previous work in compressed sensing in many 
aspects: 

1) Unlike Wright and Ma's model [2| where Gaussian 
measurement matrices are analyzed, we study the problem 
with suborthogonal matrices. These matrices often possess 
many desirable properties over Gaussian matrices in term of 
fast and efficient computation fl4l . Ifl5ll . Furthermore, we 
investigate the more difficult problem in which both sparse 
and dense error appear in the observations. This model is not 
studied in [2]. We show a surprising message that the extended 
£i minimization is stable under both perturbations, even if the 
sparse error is arbitrarily large and its support size is arbitrarily 
close to the total number of observations. A straight forward 
consequence of this result is that accurate recovery is achieved 
when measurements are not perturbed by dense noise. 

2) Our model is different from Candes and Tao in 
two aspects. First, we allow the coding matrix to be under- 
determined, that is m < n. Second, the input vector is assumed 
to be sparse. If we recast the extended ^-minimization in |5]) 
as 

min \\x\\, + A \\b - Aq,x\\, , (21) 

X 

then one can clearly see the integration of the two ^i-norms 
in a unified optimization: one is used to impose sparsity of 
the input vector whereas the other exploits error sparsity as in 
(|20j. 

3) We propose a minor but subtle modification in the 
extended l\ minimization of J2). By adding a regularization 
parameter A into ([5j, we can balance the ^i-norm of both x 
and e. Specifically, we can establish an explicit expression 
for the regularization parameter A as well as the sparsity 
levels of both signal and error. This mathematical expression is 
intuitively interpretable: signal and error sparsity levels should 
be inversely proportional. If more measurements are corrupted 
— equivalently, the error is denser — we expect to recover 
the signal with smaller support size. In contrast, we are able 
to recover the signal with larger support size when fewer 



errors appear in the measurement vector. In practice, when 
the fraction of e rror is unk nown, we can set a good-for-all 
parameter A = ^ mlo ^ /2n - 

III. Structure of our proof 

A. Bernoulli model and derandomization technique 

The Bernoulli model. Instead of showing that Theorem 
[2] holds as VL and S are sets of size m and s sampled 
uniformly at random, we find that it is more convenient to 
prove the theorem for subsets £1 and S sampled according 
to the Bernoulli model. This way, we can take advantage 
of the statistical independence of measurements. The same 
argument as presented in |4j, lfl3l shows that the probability 
of "failure" under the uniform model is less than two times 
the probability of failure under the Bernoulli model. Here, 
"failure" implies the optimization in |5]l does not recover 
exactly the signal. Thus, from now on, we instead consider 
il = {i E : Si = 1} where {<5j}i<i<„ is a sequence of 

independent identically distributed Bernoulli random variables 
taking value one with probability -q and zero with probability 
1 — 77, where rj is chosen such that the expected cardinality 
of £1 is r\n = m. Similarly, let S = {i G £1 : 8[ = 1}, 
1 < i < m where {S'A-^q are i.i.d Bernoulli random variables 
with P(5- = 1) = p so that the expected cardinality of S 
is pm = s. Toward this end, we will write A ~ Ber(?7) as 
a shorthand for A sampled from the Bernoulli model with 
parameter 77. 

The following are five important index sets that is frequently 
used in the sequel. 

• £1 are those locations corresponding to observations: £1 ~ 
Ber(?7) with r\ = ^. 

• S C £1 are locations where the measurements are 
available but absolutely unreliable. It is clear that the 
distribution of S relies on that of £1. Conditioning on £1, 
we have S ~ Ber(p) with p = — . We can also think S as 
a subset selected from the set {l,...,n} with parameter 
rjp. That is, S ~ Ber(?7p). 

• J C £1 are locations where the measurements are 
available and truthworthy. It is clear that J = il/S. 
Conditioned on £7, we have J ~ Ber(l — p). In other 
words, J ~ Ber(po) with po :— — p). 

• We also consider the index sets S c = {1, ...,n}/S and 
J c = {l,...,n}/J 

Derandomization. In Theorem [2j the sign of e* is fixed. 
During the proof, we need to place an additional assumption 
on e*. That is, the sign of is uniformly distributed, receiving 
value 1 or —1 with probability 1/2. However, by the same 
appealing derandomization technique presented in [13], the 
probability of recovering e* whose signs on the support S are 
arbitrary is at least equal to that of recovering e* whose signs 
are equally likely to be 1 or —1. This is formally stated in the 
lemma below 

Lemma 1 (Theorem 2.3 of M3V ). Suppose x* obeys conditions 
of Theorem^and the locations of nonzero entries of e* follows 
the Bernoulli model with parameter 2p, and signs of e* are 
i.i.d ±1 with probability 1/2. Then, if the solution of extended 
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i\-minimization ([5| is exact with high probability, it is also 
exact with at least the same probability with the model in 
which the signs of e* are fixed and its nonzero entries are 
selected from the Bernoulli model with parameter p. 

B. Dual certificate 

The following lemma shows that if there exists a dual 
pair (z^ x \ z^) satisfying certain conditions, then for any pair 
(x, e), its £i-norm sum is no smaller than that of (x*, e*) . 

Lemma 2. Suppose that \\Ajct\\ < 1. If there exists a pair 
of dual vectors (z^ x \z^) with the following properties, 



1) 

2) 

3) 



z {x) 
(x) 

4 

(e) 



sgn(ary) and 
sga(e* s ) and 



(x) 
y (e) 



< 3/4, 

< 3/4, 



then for any perturbation pair (h,f) satisfying f 
we have 



h\\ 1 + X 



> \\x*\\ 1 +\\\e*\\ 1 + -(\\h T c 



■M\Aj.h\\i) 



-A n .h, 



(22) 



Before proving this lemma, it is necessary to notice how 
the Lemma implies the perfect recovery of the linear program 
in |5]). Indeed, denote by (x,e) the optimal solution of |5]) 
and let x := x* + h and e := e* + /, then it is obvious that 
f = —Aa.h. By the convexity of the objective function, we 
have || :c * + / l || 1 + A||e* + /|| 1 < ||a;*||j + A ||e*||j. 

Furthermore, from Lemma [2] assuming the existence of a 
dual pair (z (x) ,z (e) ) and ||A/ c t|| < 1. tne inequality (22 1 
obeys. Combining both arguments, we have 

±(\\h T 4i + HAj.h\\x)<o- 

It is clear that the left-hand side of the above equation is 
strictly greater than for every h ^ 0. Thus, in order for 
the equality to occur, it is necessary that hr<= = and 
Ajt^t = 0. We can establish that, due to the orthogonality 
of matrix A, the condition ||Ajc<r|| < 1 is equivalent to 
\\I — Aj T Ajt || < 1. This suggests that A* jt Ajt is invertible, 
and thus, AjtHt = only if = 0. We therefore conclude 
that h = and / = — Aq.H = or in other words, (x,e) is 
the exact solution. 

Proof of Lemma [2] Denote as vq and wq the subgradients of 
|| x || x and || e || , at x* and e*, respectively. It is well-known that 
v 0t — sgn(x^) and H^OtcH^ < 1. Similarly, we have wq s = 
sgn(e* s ) and ||u>o S c || < 1. By the definition of subgradients, 
we derive 

|| a; * + fc|| 1 +A||e* + /|| 1 

> ||^|| 1 + A||e*|| 1 + ( t ; ,/ l >+AK,/) (23) 
= Wx*^ + AHelj + (v ,h) -X{w ,A Q .h) . 

Let us now consider (vo,h) — A (wq, Aq.Ii). By decom- 
posing v and Wq into vectors of index sets {S, T} and their 
complements {S C ,T C }, we have 

(v a , h) - A (w , An.h) = (sgn(x^), h T ) - A (sgn(eg), A s .h) 
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\{w 0j ,Aj,h) 



Now choosing vo TC such that (wo tc ,^t c ) = ||^t c Hi and wqj 
such that (wq j , Aj.h) = — \\Aj,h\\i, we can rewrite 



(v , h) - A (w ,An,h) = (sga(x^), h T ) 

- A (sgn(ej), A s .h) + ||Mli + A ||A J ./i|| 1 



(24) 



In addition, the identity relation z'*' = AA^ # z^ e ' can be 
reformulated as 



sgn(x^) 



aa; 



sgn(e*) 



0,7 

° T ^4-A4* ( ° S 
fx) I + AA n . (e) 



Taking the inner product with h on both sides yields 

(sgn(a$), ftr) - A (AJ. sgn(e£), ft) 



4 X i,/l T e)+A(zS e Uj.ft 



Notice also that (^4g # sgn(ej),/i) = (sgn(e s ), A g.h). Hence, 
d24b is equivalent to 



(v ,h) - X(w 0l A n .h) 

= \\hTc\\ 1 + X\\Aj.h\\ 1 -(z^lh T a)+x(z ( ; ) ,Aj.h 



HAj.h\\i), 



where the last inequality is due to ( z^) , 



(x) 



< 



and 



(e) 
'J ' 



< 
< 



|^4j»ft|li < | ||A/»ft|lv Substituting this inequality 



into (|23|l, we complete the proof. 



□ 



From the result of Lemma [2] in order to prove exact 
recovery of the convex program, it suffices to construct a 
dual certificate p 1 ',?^) obeying the conditions of Lemma 
[2] Partitioning z^ x \ z^ into two subsets belonging to T and 
T c , S and S^, the identity relation between and z^ e ) can 
be reformulated as follows 



sgn(x^) 
0t° 



AA* 




(25) 



If we can construct a pair of vectors (v^ x ',w^) such that 



,(*) 



is equal to both sides of ( 25 1, that is 



(x) . (e) 
Vj, ' + Wtp ' = 

(x) , (e) 



sgn(a;^) 



A s .{v^ + w^) = Asgn(eJ) 
Aj.{v^ +w^) = Xzf 
A n o.(v^ + w^) = 0, 

then the existence of the dual certificate (z", z^) in Lemma 
[2] is guaranteed. As a consequence, it now suffices to produce 



8 



a dual pair (v^ x \ w^) obeying 



( x ) r * \ 

Vj, — sga(x^,) 

v$ < 3/8 

= 

ll^'^IL <3A /s 

A^.u^ - 



and 



(e) 
10^ = 


= 


(e) 

Wye 


< 

oo 




e ) = 







< 3A/8 

D. 

(26) 

In the next section, we will establish that the valid dual pair 
(v^ x \w^) exists with probability converging to unity. 



C. Dual certification constructions 

We now propose to construct a dual certificate pair 
(j)' 1 ',!!)") whose components are described as follows 

1) Construction of via least-square. Since — 



0, the identity conditions A 
Anc.ii/ 6 ) = fl can now hp 



equation 



StW^ e > = Asgn(eg) and 
can now be represented by a single 



A ( e ) 



sgn(4) 



(27) 



where we recall that J c — S U fl c . Next, assuming 
that mjc r || < 1, then we have \\J - Aja T a A* JcTc | = 
|| Aja T A* JcT \\ < 1, Consequently, matrix Ajc T cA* JcTc 
is invertible. We then set 



,( e ) 



= XA* JcTc {Aj^A* JcTc y 



(e) 

Clearly, Wy c is the least-square solution of the linear 



sgn(4) 
One 



(28) 



system in (27 1. This construction has a natural interpre- 
tation: among all solutions of the linear system, ra^ has 
the minimum £2 -norm. We expect that its -norm is 



also sufficiently small to obey the condition in (26 1. 
2) Construction of v^ x \ A simple way to produce v^ x ^ is 
as follows 



«w = a:„a jt (a 



JT 



A JT ) l sgn{x* T ). 



(29) 



It is obvious from this construction that v T = sgn(x^). 
Furthermore, As»v^ x ' = and A^c 9 v^ = due to 
the orthogonality property of the matrix A. Thus, all 
three identity relations with respect to »W in (26 1 are 
guaranteed. 

We now state two key lemmas that establish the f^-norm 
bounds for and w^ e \ 

Lemma 3. Assume that fl ~ Ber(?7) and S ~ Ber(ry j o) 
where parameters r\ = m/n and p = s/m. Under the same 
assumptions as in Theorem [2] with high probability, the dual 
vector constructed in \29\ obeys 



1) 

2) 



Aj.vM\\ <3X/8, 



< 3/8. 



Lemma 4. Assume that fl and S are sampled as in Lemma 
[i] Under the same assumptions as in Theorem 2\with high 
probability, the dual vector constructed in (28J obeys 



1) 



w 



(e) 



< 3/8, 



2) WAj.w^W^ <3A/8. 

Lemmas IJandg suggest the existence of (z^ x \z^). In 
other words, the solution of the convex program in is exact 
and unique. 

IV. Proofs of dual certificates 

A. Important auxiliary lemmas 

In this section, we first develop several auxiliary results 
concerning the main proof. 

Lemma 5. Let Sq be locations sampled randomly from the 
set {1, ...,n}, So ~ Ber(po)- With probability of success at 
least 1 — n^ 1 , we have 

\\lkxk - P 1 A* S()T A So t\\ < e, 

provided that p > c Q e "Vi°g" j or Cq = 2 3 /Vv7re. 

This result has been known in the literature [26 1, E71l . (28). 
However, for completeness, we provide a brief proof which 
relies on high order moment bound of the spectral norm. We 
emphasize that the lemma is important since it provides us 
the bound of ||Ajcy||. In fact, recall that J ~ Ber(po) with 
pa = rj(l — p), Lemma |5] suggests that 



\I-P0 A* JT A JT \\ < e, 



(30) 



provided that p Q > Ce~ 2 Mfcl ° s " . Furthermore, from the fact 
that A* jt Ajt — I — Aj cT Ajc T , we obtain 

e> WI-Po^I-A'jctAjct^ 
> p^ 1 ||^. T yW|| - ( Po - 1 - 1). 

This inequality leads to ||Aj cT Ajc T || < p e + (1 — po). We 
conclude the argument by the following proposition. 

Proposition 1. Provided that m — s > ACopklogn. With 
probability at least 1 — n -1 , we have 

\\Ajo T \\ < y/l - p /2. 

Proof of Lemma^ Define Sq — {i : Si = 1} where Si is an 
independent sequence of Bernoulli variables with parameter 
Po and denote m (i £ Sq) to be row vectors of A$ t- With 
these notation, we have 

n 

A *S T A S T = ' S ^ J U i ®Ui = y^ j SiUi ® Ui. 

Applying Theorem 5 of |28| with q = logn, we obtain 

logrA Vlog- 



E 



< C 



I - Po 1 5^ (g) 



Pq 1 logn max || u 



1 112 



< C^Jp 1 {pk\ogn)/n := E, 



where the constant C = 2 3 / 4 y / 7re, and the last inequality holds 
from || Wi|| 9 < \J uk/n. 
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From Markov's inequality, we can establish 



Ui (g> u, 



>e < 



By the assumption of the Lemma that 
have with probability of success at least 1 



gVgo Vfclogn 



< i, we 



< e, 



as claimed. 



□ 



The next lemma shows the matrix As t is almost orthogo- 
nal to the matrix As t c where 5*0 is a random subset selected 
from columns of the matrix A. This property is important 
in distinguishing the set T from the set T c and helping the 
algorithm identify the true support of x* . We defer the proof 
to the Appendix. 

Lemma 6. Let So be locations sampled randomly from the set 
{1, ■■■,n}, So ~ Ber(po)- With probability at least 1 — 3n , 
the following inequality obeys 



A 



S T 



uu 2 < 



/xmax{/c, logn} 



C' Po 



for any column vector u of the matrix As t c , provided 
that po > C M max {^ lQ g _} w here C and C are numerical 
constants. 

We are now ready to prove Lemmas [3] and |4] regarding the 
dual certificates. 



B. Proof of Lemma [3] 
Part 1. By the construction of in (29 1, 

Aj.v {x) = Aj T (A* JT A jt)' 1 sgn(x^). 

Denote Ui as a row of the matrix Ajt, we have 



Aj.v 



(x) 



max | m (A * JT A JT ) 1 sgn(x T )\ 

i 

-- max | {Wu*,sgn{x T )) |, 



where we denote W := (A} rJ 4jr) _1 - The right-hand side is 
a sum of zero mean random variables, which can be bounded 
by Hoeffding's inequality. Hence, 

/ ^2 \ 

P(| (Wulsgn(x* T )) | > r) < 2 exp 



2\\Wu* 



Notice from ( 30 » that with probability converging to one, 



|| J - Po A* JT Aj T \\ < e. Thus, (1 - e)p < a min (A* JT A JT ) < 
""max (A* JT Aj T ) < (1 + e)p where 

^"min 

and a max are 

minimum and maximum singular values of the matrix. In 
addition, we have exploited the fact that spectral norm for 
any matrix H obeys < - 1 ^ H y Thus, conditioning 

on the event £ = {\\I — p A* jt Ajt\\ < e}, we have 

\\w\\ < 



with the choice of e < 1/2. Consequently, combining with 



1 I 1 2 



2 <P k , 

^ 71, ' 



we conclude that WuJ 



< 



Now setting r := A / 16fl \ g " and taking the union bound 

V Po n 

over all row vectors of matrix Ajt, we obtain 



A 



(x) 



> 



16/ifc log n 
pin 



<2| 



< in 



■¥{£ c 



h as long as k < C x p ° n . 

4 ° — log n 

1=1 and s = jm, one can 



where the inequality follows from the total probability rule: 

V(F >t)< ¥(F > t\£) + ¥(£ c ) with F := \\Aj.v^ \\^. 

We conclude that ||Ar.i/ x )|| < 

II Woo — 

Replace A = po = r - 

see that the upper bound of k automatically follows from the 
assumption that k < C (1 ~ 7 ' 1 ^^ gn y. - □ 

Part 2. In this part, we need to show that with high probabil- 
ity, 



\A* JTc Ajt{A 



A JT )- L sgn(x T ) I <3/8. 



as 



JT 

a column vector of the matrix 



consider u* A.jt(A jt Aj T ) 1 sgn(x T ) = 



Denote u, 

Ajt" and consiuer UiSijTysi JT 

( x (Aj T AjT)~ 1 A* JT u i , sgn(x^)), which is a sum of random 
variables. Its absolute value can be estimated via Hoeffding's 
inequality, 



\u*A jt (A* jt Ajt) 1 sgn(x T )\ > t) < 2 exp 



2 W 



where z := (A* JT AjT)~ 1 A* JT Ui. As previously showed, 
conditioning on the event £\ = {\\I — poA* jt Ajt\\ < e < 
1/2}, we have || (Aj T Ajt)~ x || < 2/po- In addition, we define 

the event £ 2 := {\\A JT Ui\\ 2 < \jc P o f max ^ Igg-l }, which 
bounds the l 2 norm of A* JT u with J ~ Ber(po)- We showed 
from Lemma|6]that ¥(£2) < 1 — 3n _1 . Therefore, conditioning 
on both £1 and £ 2 , we get 



l*lla< 



UA^Ajt)- 1 ^ \\A JT Ui\\ < JO 



J /jmax{fc, logn} 
p n 



Setting r 2 := 4C , Aimax{fc -'° s " } 
we conclude that 



p n 



and taking the union bound, 



» 



> r 



) < 2(n- A:)e~ 21os,l +P(£ 1 C ) +P(£ 2 C ), 



which is less than 6n : . Now replace /9 = and assume 
that m - s > C^imax{fc, logn} logn where C = 4(8/3) 2 C", 



we achieve 



< 3/8 as claimed. 



□ 



C. Proof of Lemma [4] 

1) Preliminary results: In order to set up the bounds of 
Lemma [4] it is necessary to estimate the spectral norm bound 
of ||AsT II- The following proposition establishes such a bound. 

Proposition 2. With probability at least 1 — n~\ 



(1 



1 2 

- e)Po Po 



\A* st A S t\\ < 1 



pk log n 
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Proof. Recall that S ~ Ber(?y/?). By Lemma [5] with high 
probability, we have 



I-{W) L A* ST A ST \\ < ei, 



(31) 



provided rjp > Co 



e, /ifc log n 



Note that rjp = ^, thus the 



condition is equivalent to s > 2 /ifclogn. This inequality is 

automatically satisfied by setting ei = \/ Mfcl ° s " , Therefore, 
( 3 1 1 gives us 



ST AST | 



< 1 



/^fc log n 



as claimed. 



2) Main proofs: 



□ 



Part 1. We will start with the construction of w^l in (|28 1 
Our goal is to show that with high probability, 



V := X 



A* (A A* ( s s n (4) A 



< 1/4. 



By series expansion, (/ — H) 1 = I + J^jLi H"* ■ We have 

Ay cT o(Aja T aAjc T c) 1 = A*jc T c{l — AjcrpAjcrp)' 1 

= A*ja T c + A* JcTa [Aja T A*ja T ) 3 . 

3>l 



Toward that end, denote r 



To establish the 



sgn(e^) 

upper bound of V, we elaborate on the £oo -norms of two 
quantities relating to summands of the series expansion. The 
bound of V is then followed from the triangular inequality. 
For the first term V\ := A || A *jc Tc r\\ , we have 

V,. =X\\A* ST0 sgn(e^)|| oo = A max] (u i ,sgn(e s )) |, 

where Ui is denoted as a column vector of Ast c - We notice 
that (ui, sgn(e* s )) is a sum of zero mean random variables 
(by the random assumption on the sign of eg). Applying 
Hoeffding's inequality yields 



i,sgn(e£)) | > r) < 2 exp 



< 2 exp 



2r 2 



4||ui 



2 ps 



where the last inequality is due to ||uj|| 2 < Next, choosing 
r = 3V and taking the union bound over all i € T c yield 



HAj-T-rlL > I ) <exp 



128/iA 2 s 



+ log(2n) 



which is bounded away by e log " = n 1 as long as s < 

C 



For the remainder term, denote the quantity V r 



2~2j>i Ajar* ( A J c T A *jc T yr 



we have 



V r = A 



Y^iA^Aj^A^Aj.ry-'A 



ja T l 



J>1 



A max 

MET" 



u*A JT (A* JCT A JCT yA* ST sgn(e£) 

j>0 



where itj is denoted as the i th column vector of Ajt c - Notice 
that vector Uj has length (m — s). 

Let W := J2j>o A JT(A* Jl:T Aja T y A* ST . We consider the 
term inside the max function Vj = | (W*u i: sgn(e^)) |. Again, 
this quantity's bound is an application of Hoeffding's inequal- 
ity, 

P(| (W*u u sgn(e*)) | > r) < 2 exp ( - ) 

V M\W *Ui\\ 2 J 

Next, we have 

\\W* Ui \\ < \\A ST \\ \^J\A^ JC Aja T \A \\A TJ \\ \\ Ui \\ 2 

\\A ST \\\\A* JT \\ 
l-\\A* eT Ajc T \\ M2 ' 

We now provide the spectral and £2 norms of these terms. 
Define the following three events 

£ x := {\\A* JCT Ajo T \\ < l-po/2}, 



£2 :={\\A*jt\\ < y/Zpo/2}, and 
£3:={\\A ST \\<(1 + X [^^)^ P} . 



Recall by Proposition [T] that the event E\ occurs with high 
probability. Moreover, from Lemma [5] with high probability 

\\hxk - Py 1 A* JT A. J T \\ < e p rovide d Po > C e- 2 ^^. 
Thus, ||j4* 7T || < yj po(l + e) < y/Zpo/2, assuming that 
e < 1/2. Finally, £3 occurs by Proposition [2] and the fact 
that ||o < ^ m ~ & ^ = /j,p a . Conditioning on these events, 
we conclude that 



, M/ * ,|2 , (3p /2)(/ip )(?7p) 
1^ ^2 < 7^7^ 



(Po/2) 2 



/i/c log n 



< 6/i : 



/i/c log n 



We consider two following cases regarding the size of the 
set S: 

Case 1: if s > ^fclogn, then ||W*tti||2 < Set r = 

and take the union bound over all i G T c , we attain 

1 \ , „ / 1 



K > 



8A 



//A 2 log n ■ < yTf%. 



< 2 exp ( - 

+ p(ff ) 



256A 2 ^i 

2) + 



log n 
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By assuming s < Cjm with a sufficiently small constant C, 
A*A 2 ^ < Cj^. Hence, V r < ^ with probability 1 - 5n _1 . 



Case 2: if s < ^fclogn, then ||W*Ui||2 - 12 A^ 
Again, setting r = -k- and taking the union bound, we have 



s / [ik log n 



K> ^ ) <2exp ( - 



-l 



lOB 



s / [ik log n 



provided that s < Cjm and k < C 7 m . 

1 — ' — log n 

We complete the proof by employing the triangular inequal- 

□ 



ity: V<Vi + V r <&. 



Part 2. In this part, we need to show that with high proba- 
bility 



V := A 



A JT oA*ja T c (Ajc T cA*jc T a) 



i / sgn(e£) 
On. 



< 



A 



Again by series expansion, we first have 

(AjcrpcAjarpa)^ 1 = 'Ylj>Q{A.] a TA*j cT )' J . Moreover, since 
AjtcA*j cTc = — AjtA*jct, we arrive at 



JT /i JoT , 



AjT-=A*jc T c(Ajc T cAjc T c) 1 



sgn(4) 



j>0 \ J 

= A JT (A* JcT Ajc T yA* ST sgn(ej). 

Let W := J2j>o( A J"T A J c Ty A* ST and let u. t e R k be 
a row vector of Ajt- We consider the following bound 
Vi :— \{W*u*,sgn(eg))\. Analogous to the preceding proofs, 
Hoeffding's inequality is used to estimate Vi, 



' (Vi > t) < 2 exp 



2r 2 



M\W*u*\\l 



The spectral norm of W can now be estimated as follows 



V. Proof of Theorems [3] and Dealing with both 

SPARSE AND DENSE ERRORS 

A. Proof of Theorem [i] 

Our proof technique is adapted from [29 1 (see also ll30l ) 
but in a different context. In ||29l , the authors studied the 
matrix completion problem under noisy observations, while 
we consider the conventional compressed sensing case. Let 
x be the optimal solution of (^. Since x* is also a feasible 
solution of ([9]), ||.Afi.x* — 6|| 2 < a. We have an important 
observation 



\\A Q .(x-x 
Denote g = x - 



< \\A n .x -b\L + \\A n .x* -b\L< 2a. 



(32) 



■x*, our goal is to establish a bound for ||.g|| 2 . 
At first, note that ||p|| 2 = ||vlg|| 2 , the triangular inequality 
gives us 

2 



g\\ z 2 = \\A n .g\\ z 2 + \\A nc .g\\ z 2 = 4a 2 + \\A n ..g\\ 



(33) 



It now remains to bound the second term. Our strategy 
is to bound || A^ cT AQc m g\\ 2 and || Aj lcTc Ao,a,g\\ 2 separately, 
then the bound of 1 1 j4st2=» <?|| 2 i s obtained via the following 
expression 



|,W||a = \\A* nc .An*.g\\l 



II 2 



\A 



Anc.g\\ 



(34) 



where the first expression follows from ||^4Q C#J 4f2 c »5l| 2 = 
(Anc.g,A n e.Afi c ,AQc.g) = (Aac.g, A^.g) = \\AQc.g\\ 2 , 

To bound |j A^ cTc Anc.<?|| 2 , we bring Lemma[2]into action: 
for any perturbation pair (h,0) satisfying An. h = 0, we have 

1 



(35) 



\W\ 



By setting h = A* nct A^ c% g 7 we see that A^h = 0. Hence, 
applying Lemma [2] yields 

IN* + Anc.Anc.slli > ||x*|| x + - \\A^o T oA Q c u g\\ 1 . (36) 

Furthermore, noting that x* + g is the optimal solution of the 
convex program |9]). This yields 



~" ^ l-\\A* JeT Aj. T \\ \\x*\\ 1 >\\x* + g\\ 1 >\\x* + A* Q c.A Q c.g\\ 1 -\\A* n .An.g\\ 1 



Conditioning on events 8\ and £3 in Part 1, together with 

INI2 ^ \/Mn' We § et 



l|wxil a <llw|IIKII 2 <j4^^. 

y Pa n 

Set t = 1/4 and take the union bound over all i € J, 



F(V > 1/4 I £ u £ 3 ) < 2 exp 



256/ir/pfc 



log n 



The right-hand side is less than 2e log,i = 2n 1 as long 



Po» 



(m-sy 



256^ - 2567^ " 6l0g "' TWS iS aUt ° matic from the 

assumptions that k < C- — — ^ — and s < 7m. □ 

1 — 7 log n — ' 



In combination with ( 36 1, we have an important inequality: 

WAT 



A< 



inates the ^ 2 -norm 
we have 



< A\\A* n ,A n .g\\ 



\A* n a T cA^c m g\ 



. Since the ^i-norm dom- 

2 < \\A* naTc A^.g\\ l and 



\Anc T oAnc.g 



| 2 <4||A . J 4o..g|| 1 
<A^\\A* n ,A n .g\\ 2 
= 4Vi||A n .g|| 2 . 



(37) 



It is left to develop a bound for || A^ lcT Ao J c,g\\ 2 . We observe 
that AqtAqc T = — Asit'A^tc due to the orthogonality 
property of A. Thus, for any vector u, 



\AnTA* ncT u\\ 2 



= \\Aa T oA 



< \\A. T cA*r, c 



n°T c ' 
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In addition, applying Lemma [5] with fl ~ Ber(ry) we have 
\\l — ?7 _1 AQ T AnT|| < e < 1/2 with high probability. Hence, 



?7 1 ||A aT A^ T M 



> \\A& T u\\l - \\l - Tj-'A^AnTW Uh^Wl 



In other words, y/rj/2 \\Aq cT u\\ 2 < \\Aqt a q^t u \\2- Combin- 
ing these pieces together while setting u = Anc.g yields 

1 



|A^c T A c. 5 || 2 < 



V2 



1-4 



Aac.g\\ 



The right-hand side is in turn bounded by 



fn/2 



\A 



si c t<= 



due to (37 1. Inserting this bound with the bound in (37 i into 
([34j, we obtain 

^ + lj 16n \\A n .gf 2 < ( ^ + 1 ] 64cr 2 n, 



ll^ncfllla < 



where the last inequality follows from the known bound in 



32b. Combining this result with (33 i we can conclude that 



\\g\L<2a + 8aJn - + 1 , 



as claimed. 

B. Proof of Theorem [4] 

Proof. The proof of this theorem is considerably more in- 
volved since we have to control two residual components 
x — x* and e — e*, where (x, e) is the optimal solution pair 
of \l2). Set = x — x* and g( e ) = e — e*, our goal is to 



7 0*0| 



I.9 (e) ll 2 - 



bound ||<jr : ||., 

At first, notice that (x*,e*) and (x,e) are pairs of feasible 
solution, we establish an important bound 



A n.g {x) + g {e) 



< \\An.x- 



\A n .x* + e* - fe|| 2 < 2a. 



(38) 



To bound \\g 

as ||<?( x ) || 2 = 



0*01 



7 (e)| 



\Ag(*)\\l = 



, we first express ||<7^|L 



Un-.g^l 



\A n .gM\ 



Furthermore, from the fact that i||a + 6|| 2 + l||a — b\\ 2 = 



a\\ 2 + \\b\\ 2 for any vectors a and b, we get 

2 , , 2 

2 



,0*0 



(e) 



Afi<=.5 



Or) 



< 



An-, 9 



Or) 



+ 



7 (e) 



It is left to bound the sum of the second and third term on 
the right-hand side of the equation. We express this sum as 



Anc.g 



{x) 



70*0 _ » 



A^.g 



Or) 



Aa.g ( "' - g*- 
A s .g [x) 



9 s 



Aj.g 



(x) 



(e) 

9 J 



where we recall that indices in S are locations where mea- 
surements are available but unreliable and indices in J are lo- 
cations where measurements are available and trustworthy and 
fi = SUJ. To upper bound this sum, we consider the establish- 
ment of the upper bounds for each term Mi := ||Ano«<j , ^|L+ 



A s .g 



0*0 



(e) 

9 s 



and M2 



A/.. 



(e) 
9 J 



separately. 



One of the crucial steps in bounding Mi and M2 is the use 
of Lemma [2] which states that for any perturbation pair (h, f) 
satisfying / = -A n .h, 

\\x* + h\\ 1 +X\\e* + f\\ 1 > 111*11!+ Xn 

+ ^IIMIi + i|IMIi- 



Now let us denote 
1 

2 

as well as 



/+ := -i(A n .g^+g { 



°) and J- ■■= -\{An.9 [x 



)-«(' 



h+ 



-A* n .f+ and hr 



~An./ 



A nc ,Anc.g 



(x) 



It is easy to establish the following properties from this 
construction 



7 (a0 
7 (e) 



-h+ + h- 

-/+ + r 
= ll/ H 



< (T 



(40) 



Mi = ||An..fl^|| a + 2||/ i 
1 M 2 = 2||/J||^. 



s lb 



1) Bound M 2 : At first, since (a;* + g {x \e* + g^) is the 
pair of optimal solution of the convex program, we have 

ll^lli + ll e *lli > \\ x * + 9 {x] \\i + \\ e * + 9 (e) |lr Furthermore, 
decomposing g( x > and g( e > and using the triangular inequality, 
we can derive 



x*+g 

= \\ x * 
• 1 1 a:* 



(x) 

hr 



-X 

hr 



9 



(e) 



1 + X\\e*-f+ + f- 



| 1 + A||e*+/- 



( W 



A l|/ + lli)- 



(41) 



Applying Lemma |2] together with the observation that / = 
-An.h~ yields 

||x* + /z- 1 1 x + A || e* + /"Hj 



< 2a 1 



Anc.g 



1 

2 



1 

+ 2 



7 (e) 



An. 9 



(x) 



> 



ll^lli + A|| e 1l 1 + 7 ||/,7|| 1 + 



"Combining these arguments, we get 



A 



1 



(39) 



|/vdli < ll^lli + All/ 4 
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Converting both sides to the ^2-norm using the crude inequal- 
ity ||a|| 2 < 1 1 a, 1 1 x < V"ll a ll2 for a11 a <= K "' then applying 
|| / + 1| 2 = ||/i + || 2 < 0, we obtain the bound 



min{A, 1} 



(II/. 



J \\2 



,) < v^(l + A)||/ + | 



(42) 



A specific consequence of this analysis is a bound of Af 2 

M 2 =2||/7|| 2 ! <2(||/7L + ll/i 
< 2 



', i!2 <2(||/7|| 2 

4(1 + A) 



T"\\2' 
2 



i{A,l} 



a 2 n. 



(43) 



2) Bound M\: In this section we would like to bound 



0)1 



fs 

12 1 - ||JSI12 - ^" UL1 " g * - v -w*> . 

then to bound the quantity of interest, it is equivalent to 



_ 1 1 2 

2 1 1 / g 1 1 9 . Denoting z 



bounding 

AhJ- + 

-Aj.fJ - 



By the construction of h and / , we have 



h~ - AQc.AQa.g^ = leading to 



T 



= A* Jc .z 



T 

Or- 



is 

-A n o m gl°) 



hrp 

T <= 



T 

T <= 



(44) 



where the second identity follows from J c — S U 

First we control the upper bound of the £2 -norm of the 



left-hand side of (44 1, which can be attained easily from the 
triangular inequality 



T 



< \\A*J.fJ 



\.f. 



J II2 



Next, the ^2-norm of the right-hand side of (44 > is now 
lower bounded by 



A* JC ,z + 



= \\A 
> 11*1 



"T 

Or<= 



2 

I II2 ^ (,Aj c T^i } 



II2 _ ^ Mj c tI 



— 1 1 2 
1 2 



2|U 



J<=Tl 



|/^||;-2v/l-p /2| 



2 IIM 2 
2 II^tL 

MI2 \\^T H2 



>(i- Vi-po/2)(NI^ + |M 2 ), 

where the third inequality follows from Proposition [T] 
|| A* JcT Ajc T \\ < 1 — po/2 and the last inequality follows from 
the standard argument a 2 + b 2 — 2aab > (1 — a) (a 2 + 6 2 ). 

Combine these pieces together with the fact that 1 — 
y/i - pa/2 > ^, we attain 



\z\\i + ii^tL ^ 



Po 



-(II/. 



7 || 2 



|^|| 2 ) 2 . 



(45) 



Next, notice that \\z\\ 2 = II/, 



s Il 2 



"^n^S^II, and together 



with (43 1, we get the following bound of Mi 

M 1 < 2(\\z\\ 2 2 + \\h T \\ 2 2 ) < ~(\\fj\\ 2 + \\h TC \\ 2 ) 2 
2 Po 



< 



8 / 4(1 + A) 



(46) 



a 2 n. 



Po Vminjl^j^ 

Obviously, from combining these two previous inequalities 
on M\ and A/ 2 , we can establish the bound of the sum Mi + 
M 2 . However, we can tighten this bound by a constant factor 
from the following simple steps: 

11 1 1 9 11 1 1 o 

+ \\fs\\t+\\fj\\l) 



Mi + M 2 < 2( 



Anc.g 



(x) 



< 2 


{>■ 


<2 








<2 


i ± + 







J II2 



T" H2) + \\fj \\2 

(^ + i)(||/,7|| 2 + ll^ll2) 2 



4(1 + A) 



a 2 n, 



min{A, 1} / 

where the second inequality follows from (|45]> and the last 



inequality follows from ( 42 1 



Inserting the above bound into ( |39] > leads to 

' 4 



< 2cr 2 + 2 

2 VPO 



1 



Finally, applying ( || , ., 
will complete our proof. 



4(A + 1) 
min{l, A} 

2 



a 2 n. 



? (e) ll 2 ) 2 < 2 (lk w ll2+lk (e) ||;) 

□ 



VI. Oracle inequalities 

In this section we would like to discuss the optimality of 
the reconstruction error bound in Theorem [4] In particular, we 
compare this result with the best possible accuracy one can 
achieve. Suppose we had available an oracle informing us in 
advance the locations of T nonzero coefficients of the signal 
as well as S nonzero coefficients of the sparse noise. Then 
one can use this valuable information to construct the ideal 



estimator pair (x 



Oracle ^Oracle 



) by least-square projection. To 



see this, we decompose y into two components: ys and yj, 
where yj is not affected by sparse error. Thus, 

V.i = A JT x* T + vj. 



Recall from (30 1, A* jt Ajt is invertible. In particular, po/2 < 
Omin{A* JT Ajr) < (T m ^{A* JT A JT ) < 3p /2 where cr min and 
Cmax are the minimum and the maximum singular value of 
the matrix, respectively. Therefore, the least-square solution 
of this linear system is 



Oracle 



(A* JT Aj T )- x A* JT yj. 
The oracle error bound on the signal is now estimated by 



II Oracle 

It is obvious that 
Therefore, 



T|| 2 " 

Wh- 1 



{(A^Ajt^A^vjI^ 



< 



\x 



Oracle 



X 



T\\2 



< 



Ijjpj for any matrix H. 

(A* JT Aj T )- l \\\\Aj T \\\\uj\\ 2 <aVWo- 

(47) 
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Now the oracle solution of the error can be found from the 



identity equation on the set S: ys = AsrXf 
This leads to 



Oracle I ^Oracle 



Oracle 



e* s + A st {xt ~ %t 



Oracle \ 



Recall in Proposition [2] that 

a/2 / , , IfJ-klogn 



1-4 



ST | 



< (VP) 



= J? l/2 



1 



1/2 



s //ifclogn 
m 



1/2 



< V2, 



provided that m > /ifclogn. We conclude that the oracle error 
bound on e* has to satisfy 



I Oracle 
\ e S 



4|| 2 < V2 \\x% Tacle - x* T \\ 2 < VWp~o- 
In conclusion, with the help of the oracle, we have 



Oracle 



*ll i II Oracle *|| ^ o_ la I 
C \\? + e ~ e \\? - "toV "/ 



po- 



rn 



with adversarial noise. Consequently, our error bound in 
Theorem |4] loses a ^/n vis-a-vis over the ideal bound achieved 
via the oracle help. 

VII. Numerical experiments 

In this section, we provide extensive simulations to confirm 
the validity of our theoretical results. Since the observation 
model in (Wl can be expressed as y — [Aq, \l] z * — 
where z* — [x* , Xe* ] T and / is the m x m identity matrix, 
the extended l\ -minimization in |5]) and the noisy version in 
( 1 2 1 can be recast as conventional £\ programs 



and 



s.t. y = Bz, 



s.t. \\b- Bz\\ < a. 



In this section, we use the Homotopy solver introduced in [31 ] 
for our experiments. Another important implementation detail 
is the choice of the parameter A. For moder ate signal d imen- 
sions (e.g n < 10 8 ), we suggest to set A = 



,, n u/2 . With 

i(log n) L i 2 

this choice, measurements are allowed to be corrupted up to 
25% as presented in our theorems. Of course, if we know in 
prior that the signal is very sparse, reducing the value of A will 
help retrieve the signal under more corrupted measurements. 
In practical applications, we recommend A = y ~^^ jT7? as 
a "good-for-all" parameter. 

A. Exact recovery from grossly corrupted measurements 

We first illustrate the correct recoverability of the signal un- 
der gross error as provided in Theorem|2] We consider random 
signals x* of varying lengths n = {1024,2048,4096,8192}. 
For each n, we generate signals of sparsity fc where fc varies 
from 1 to 60 with step size 2. Here, magnitudes of nonzero 
entries are Gaussian distributed and their locations are chosen 
uniformly at random. For each sparsity level, the measurement 
matrix ^4$!. is produced by uniformly selecting m = 500 




Signal sparsity 



Fig. 1 

The probability of success as a function of signal sparsity for 
various signal dimensions. here, a total of m = 500 
measurements are observed and 1/4 of them are grossly 
corrupted. 



rows at random from the Fourier matrix A. Error vector 
e* is generated to have uniformly distributed support with 
cardinality s = m/4 and the polarity of nonzero entries being 
equally likely positive or negative. We set magnitudes of e* 
such that ||e*|| 2 > 100 ||x*|| 2 . The reader should note that this 
setting yields an observed signal that is significantly dominated 
by the noise. 

For each value of the signal sparsity fc, we repeat the 
experiment 100 times and keep track of the pr obability of 
exact recovery. In all experiments, we set A = </ m n ^ ra )i/2 • 



The algorithm is declared to be successful if the relative error 
with respect to x* satisfies — x*\\ 2 / \\x*\\ 2 < 10~ 3 . The 
performance curve is plotted in Fig. [T] Numerical values on 
the x-axis denote signal sparsity whereas those on the y- 
axis denote the probability of exact recovery. Interestingly, 
this experiment demonstrates that the theory provides an 
accurate prediction of the simulation results even for relatively 
small problem sizes. In particular, perfect recovery is still 
attained with signals of moderate sparsity level even if 25% 
measurements are grossly perturbed. Furthermore, the sparsity 
level is proportional with as expected. 

Next, we fix the signal dimension to n = 1024 and 
performs the same experiments with varying signal sparsity 
fc = [20, 25, 30]. Fig [^demonstrates the probability of success 
with varying fraction error s/m. Note that as the signal's 
sparsity level increases, we expect to recover the signal under 
fewer corrupted measurements. 

B. Stable recovery from both dense and sparse corrupted 
measurements 

We now demonstrate stable recoverability when measure- 
ments are both contaminated by gross sparse and small dense 
error. We generate small noise v from i.i.d. W(0, S 2 ). The 
signal x*, the sparse error e* and the measurement matrix 
Afi, are constructed similarly as in previous experiments. For 
each setting, we perform the simulations 100 times and report 
the average error. 



We first evaluate the performance of ( 12 1 with the signal x* 
whose dimension and sparsity level are fixed to be n — 1024 
and fc = 20. We also set the number of measurements and 
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0.1 0.2 0.3 0.4 0.5 

Fraction error 

Fig. 2 

The probability of success as a function of fraction error 
s/m. Here, we fix signal dimension to n = 1024, a total of 
m = 500 measurements are used and the signal sparsity is 
k = [25,30,35]. 




Fraction error 



Fig. 4 

RMS ERROR AS A FUNCTION OF s WITH n = 1024, m = 500, k = 20 
AND a = 1. 
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Fig. 3 

RMS ERROR AS A FUNCTION OF a WITH n = 1024, m = 500, fc = 20 
AND s = m/4. 



the error sparsity to be m = 500 and s = m/4. Non-zero 
entries of the signal and the sparse errors are i.i.d. Af(0, 10). 
Estimation errors are quantified by the root-mean square 



(RMS), which is defined as 



. jn and 



In, 



respectively. Fig. [3] shows the RMS error with varying noise 
level. We also demonstrate in this figure the RMS errors of 
an oracle obtained from Section |VI] Fig. [3] clearly illustrates 
that the RMS errors grow almost linearly with the noise level. 
Furthermore, RMS errors attained by solving ( 12 1 is just twice 
the RMS error achieved by the oracle. 



Now we fix J = 1 and run the optimization in (12i for 
varying values of error sparsity. Fig. [4] establishes fact that as 
s decreases, we expect to achieve more accurate recovery. 



C. Experiments with images 

In our last experiment, we consider the problem of recov- 
ering an image from highly corrupted undersampled Fourier 
coefficients. As usual, the data is given by y = Aq,x* + 
e* + v where An, is a partial Fourier matrix obtained from 
subsampling rows of the full 2D Fourier matrix A, e* is a 
sparse error vector whose nonzero entries can have arbitrarily 
large magnitudes, and v is a small dense noise vector. In this 
experiment, x* is the Shepp-Logan phantom image (see Fig. 
[5}, which is not sparse in the spatial domain but in the gradient 
domain. Therefore, to reconstruct x*, we use the total variation 



(TV) criterion and minimize 



mm x 



TV 



+ A I|e|| -l s.t. \\y - Aq.x - e|| 2 < a, (49) 



where \\v\\ 2 < cr is assumed to be known and 



|a;|| TV is the 



li-norm of the gradient, also known as the total -variation of 
x. This norm is formally defined as 



I TV 



ij 



(V^)f, + (V„a:) 



(50) 



where and V„ denote the discrete finite difference opera- 
tors along the horizonal and vertical coordinates. To optimize 



(49 1, we employ the classic alternating direction method 
(ADM) as presented in [32|. In this particular experiment, we 
perform a two-step algorithm 

1) We solve d49} via the ADM method. The optimal 



2) 



solution is denoted as (x,e). 

Next, we select J € {l,...,m} as locations where 
coefficients of e are zeros or approximately zeros. These 
locations correspond to reliable observations. Then, we 
solve the following optimization 



TV 



S.t. 



\y,j - Aj,x\\ 2 < cr, 



(51) 



where only clean observations are considered. The out- 
put of ( [ST) is what we expect to get. 
In this experiment, we sample 12267 Fourier coefficients of 
the 256 x 256 phantom image x* along a number of radical 
lines (as seen in the top right of Fig. [5] 45 radical lines are 
sampled). We then select 50% of these coefficients uniformly 
at random and purposely add them to a deterministic large 
noise vector whose magnitudes are twice larger than the mag- 
nitudes of Fourier coefficients. This process assumes that half 
of the observed Fourier coefficients are significantly corrupted 
during the data acquisition. We note that the locations of these 
missing entries are unknown. All the Fourier coefficients is 
afterward contaminated by a Gaussian noise vector with zero 
mean and standard deviation 0.01. Fig.|5]on the bottom left and 
right shows the reconstruction from minimizing the TV only 
and from the aforementioned two-step algorithm, respectively. 



In the optimization (49 



A is set to be 



It is clear 



m log n * 

that while the conventional TV minimization fails to recover 
the original image, our proposed method recovers the image 
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Fig. 5 

Top left: original 256 x 256 phantom image. Top right: Fourier 

DOMAIN SAMPLING POSITIONS WITH 45 RADICAL LINES. BOTTOM LEFT: 
RECOVERED IMAGE FROM THE TV ONLY. BOTTOM RIGHT: RECOVERED 
IMAGE FROM OUR PROPOSED OPTIMIZATION IN ((49}. 



almost exactly. Notably, the relative error ^ 11!^""^ 2 of our 



method is 0.0887. 



VIII. Discussion and conclusion 

In this paper, we present a complete analysis of a surprising 
phenomenon: one can recover perfectly a sparse signal from 
grossly corrupted measurements by linear programming Q, 
even if the corruption is up to a significant fraction of all the 
entries. More specifically, we deliver an explicit connection 
between sparsity levels of the signal and the error. Our result 
can be interpreted as a generalization of compressed sensing, 
where measurements are both incomplete and corrupted by 
sparse errors. Furthermore, our results indicate that robustness 
is still retained even in a more challenging situation: the 



convex program ( 12 1 can stably recover a sparse signal under 
measurements perturbed by both gross sparse and small dense 
errors. Particularly, recovery error lies within a constant frac- 
tion of the dense noise level. We also establish stable recovery 
for a much more general class of signals — approximately 
sparse signals. 

As exhibited in Theorem [T] when the fraction of error is 
close to 1 — or in other words, most of the measurements are 
corrupted, signal sparsity k is still allowed to be proportional 
to Jj2 ;"g2 n in order to retain accurate recovery. We conjecture 
that this bound is optimal. That is, we cannot achieve perfect 

-) and the error support 



al. Il37l have shown that one can exactly recover a low- 
rank matrix L 6 jj"i xn 2 f rom its grossly corrupted entries 
M = L + S by solving the following convex program: 



subject to M = L+ S. (52) 



mm|| J L|U + A||5'|| 1 



More specifically, the authors of 01311 . 11341 proved that as 



long as the rank of L is an order of 



with 



max{ni,n2}, then the solution of (52i with an appropriate 
choice of parameter A is exact even if almost all entries of 
L are arbitrarily perturbed. Interestingly, the results in these 
papers shares similar behavior as what presented here in our 
paper. We believe that similar phenomena also holds for other 
high-dimensional signal and error models as well. 

IX. Appendix 

Proof of Corollary [7] At first, we observe a variant of Lemma 
[2] Assuming the existence of a dual vector {z^ x \z^) satis- 
fying properties of Lemma [2] then for any perturbation pair 
(/, h) such that / = — Aq,,H, we have 



\\x* + h\\ l + \\\e* + f\\ l >\\x$\\ l + \\\e*\\ 1 

- I|zt°IIi + i(IMi+ a IIA/./illi). 



(53) 



The proof is essentially analogous to that of Lemma [2] The 
only difference is the non-sparse nature of x*. Now decompose 
x* into x* T and x* TC and use the triangular inequality to provide 
a lower bound for + h\\ lt we have 

liar* + hW.+X \\e* + /|K > \\x* T + %+A ||e* + /||HI4-lli 

Applying Lemma [2] to the bound \\xj. + + A ||e* + 
will lead to the inequality ( |53") l. 

Following closely the proof of Theorem [4] except in bound- 
ing the quantity M2, we employ the inequality in (53 1. With 
the same notations, we have \\x* + g^ x ' 



+ e* + ; 



t<0| 



< 



lanii + Kii! = ii^tIIi + ii^iIi + ik 

lound of ||x* + .9 (a;) || 1 + + .9 ( ^]| 1 in <l 41 
53|), we get a similar result as in (|42) 



9 111 
Using the lower 

together with 



i{A,l} 



(II/jL + II^U ^ v / ^(i + a) ( t + 2||^ c || 1 



reconstruction when k ~ 0(- 

size s is close to to. In fact, we claim this conjecture in our 
upcoming paper for a class of Gaussian measurement matrices 
[33 1 . How to establish a similar result for suborthogonal 
measurement matrices is an interesting open problem. 

We would like to mention a related work that describes 
a similar phenomenon. Recently, Candes et al. |[T3l . [34|, 
Chandrasekaran et al. 11351 . Xu et al. (36 ] and Agarwal et A* t 



The rest of our proof follows exactly from the analysis of 
Theorem |4] □ 

Proof of Lemma^ The proof is essentially analogous to 
the one presented in [14]. We first establish a bound for 
E \\Ag T u\\ „, and then show that 
around its expectation. 

Define Sq — {i : Si = 1} where Si is an independent 
sequence of Bernoulli variables with parameter po and denote 



A*a rr,u L concentrates 

t>o i 11 2 



l k the 
notations, we have 



by Vi £ R the i th column of matrix A* SoT . With these 



E 



Notice that from the orthogonality property of A, J^ILi uivi = 
= where a is a column of matrix A,t c - Thus, 
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by subtracting this zero term from A* SqT u, one can see that Notice that 2~^™=i v i v t = ^t.^»t = I by the orthogonality 



Ag oT it is a sum of zero-mean random variable 



A S t u = z2( S i - Po) u i v i- 



We can now estimate E ||^-s t u || 2 as follows 

n 

E ||^S T U || 2 = E X!^ 1 ~~ Pof u i ( v U v i) 
i=l 

+ E ^ (Si- Po){5j - p )uiUj (vi,Vj) . 

The second term vanishes due to the independence of Si, i = 
1, n. Furthermore, E(<5j — po) 2 — po(l — po) < Po- Hence, 

n 

e II" 4 Sotw|| 2 < Pomfx\\vi\\l(£2u?) 

i=l 

II II 2 ^ 
= p max ^ 2 < po — . 

i n 
Therefore, by Jensen's inequality, we conclude that 



E U 



S T U \\ 2 



<^\\A* SoT u\\ 2 <J Po ^ 



We now apply a remarkable result from Talagrand that 
bounds the supremum of a sum of independent random vari- 
ables. Let Zi,...,Z n be a sequence of independent random 
variables and let M be the supremum defined by 

n 

m = supyyz i ), 

where g is a family of real-valued functions. 

Theorem 5. If \g\ < B for every g G Q and {s(#j)}i=i,...,n 
have zero mean for every g G Q, then for all r > 0, 



>(\M-EM\ > t) < 3exp - 



/; 



log 1 



Bt 



C T B ° V o-z + BEM 

where a 2 = sup geg J27=i E 9 2 ( Z >)> M = 
sw Pgeg I S™=i and Ct > is a small numerical 

constant. 

By the definition of norm, we have 

M '■= \\ A *S a T u \\ 2 = ™^ X { A *S T U '9) 



max VV^ - p )ui (vi,g) 
a U<i . 



\\9\\i 

Denote Zj = (#j — po)uiVi, we have M is the supremum 
sum of independent random variable g(Zj) where g(Zi) := 
(St - p )u % (vi,g). Since M > 0, EM = EM. The absolute 
value of g(Zi) is bounded by 

\g(Zi)\ < \\(5i - p )uiVi\\ 2 \\g\\ 2 < \ui\ \\vi\\ 2 < -Vk := B. 

In addition, from E(Si — po) 2 — po(l — po), a 2 is computed 
from the argument 



]TE.g 2 (Z 4 ) = £po(l -PoK 2 (^,5) 2 
< p ma,xv 2 g(y2v l v*)g < 

1. ' * 



»=l 



Po^ Hfflla 
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property of A. Then, a 2 < max 
Applying Talagrand's inequality yields 



a|| 2 <iA>£NI 2 - P°* 



P (M > EM + r 
< 3exp 



CrVkp/i 



log 1 



Po + *VPoW n ) 1/2 



(54) 



We need to consider two cases 

1) If Po > ky/pQ-, or equivalently, po _: ^r-> we select r 



such that r < po/Vk. Thus, the right-hand side of (54i 
is bounded by 



3 exp 



CxVkp/n 



log 1 



~2po~ . 



which is in turn smaller than 3 exp I — 



due 



3CtPo/V™ , 

to the simple observation that log(l + a;) > 2x/3 for 
< x < 1. Set r 2 := Cpo^ 2 ^ where C = 15C* T , 
the right-hand side of (54i will be less than 3e~ logTl = 
3n _1 . Note that this choice of r is consistent with the 
condition r < po/Vk as long as p > C A ' fcl ° s " . We 
conclude that in this case 



M > 



Po- 



Cp, 



/i log 71 



< 3rT 



< 



In other words, with high probability, M 

/^t7~ max{fc,logra} 

2) On the other hand, if p Q < we select r such that 



T < 

than 



Po^f- The right-hand side of (|54|l is now less 
< 3ex P I . i/ T 2 



2(p fcp/n) 1 /2 



3C T ^ /z (/x/n)3/2 



Similarly, the right-hand side of (54 > will be less than 
3n _1 by setting t 2 := Ckp^ 2 (-f^ log n. This choice 
of r is consistent with its bound as long as po > 



C 



Therefore, 



p (m > ] Jp^+ jck^ P y\^ 



< 3n 



In other words, with high probability, M < yC'po 1 ^- 
and the proof is completed. 

□ 
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